17 research outputs found

    Comparison of CNN-Learned vs. Handcrafted Features for Detection of Parkinson's Disease Dysgraphia in a Multilingual Dataset

    Get PDF
    Parkinson's disease dysgraphia (PDYS), one of the earliest signs of Parkinson's disease (PD), has been researched as a promising biomarker of PD and as the target of a noninvasive and inexpensive approach to monitoring the progress of the disease. However, although several approaches to supportive PDYS diagnosis have been proposed (mainly based on handcrafted features (HF) extracted from online handwriting or the utilization of deep neural networks), it remains unclear which approach provides the highest discrimination power and how these approaches can be transferred between different datasets and languages. This study aims to compare classification performance based on two types of features: features automatically extracted by a pretrained convolutional neural network (CNN) and HF designed by human experts. Both approaches are evaluated on a multilingual dataset collected from 143 PD patients and 151 healthy controls in the Czech Republic, United States, Colombia, and Hungary. The subjects performed the spiral drawing task (SDT; a language-independent task) and the sentence writing task (SWT; a language-dependent task). Models based on logistic regression and gradient boosting were trained in several scenarios, specifically single language (SL), leave one language out (LOLO), and all languages combined (ALC). We found that the HF slightly outperformed the CNN-extracted features in all considered evaluation scenarios for the SWT. In detail, the following balanced accuracy (BACC) scores were achieved: SL—0.65 (HF), 0.58 (CNN); LOLO—0.65 (HF), 0.57 (CNN); and ALC—0.69 (HF), 0.66 (CNN). However, in the case of the SDT, features extracted by a CNN provided competitive results: SL—0.66 (HF), 0.62 (CNN); LOLO—0.56 (HF), 0.54 (CNN); and ALC—0.60 (HF), 0.60 (CNN). In summary, regarding the SWT, the HF outperformed the CNN-extracted features over 6% (mean BACC of 0.66 for HF, and 0.60 for CNN). In the case of the SDT, both feature sets provided almost identical classification performance (mean BACC of 0.60 for HF, and 0.58 for CNN). Copyright © 2022 Galaz, Drotar, Mekyska, Gazda, Mucha, Zvoncak, Smekal, Faundez-Zanuy, Castrillon, Orozco-Arroyave, Rapcsak, Kincses, Brabenec and Rektorova

    Tools and approaches that used ML or DL approaches to analyze TEs.

    No full text
    TIR-Learner uses neural network, k-nearest neighbors, random forest, and Adaboost for the ensemble method, while ClassifyTE uses k-nearest neighbors, extra trees, random forest, support vector machine, AdaBoost, logistic regression, Gradient Boosting Classifiers and XGBoost Classifier for the stacking method. Abbreviations: RFSB: Random forest selective binary classifier, C: Classification, D: detection, A: annotation, CL: curation of TE libraries, NI: novel insertions, TU: TransposonUltimate.</p

    Fig 1 -

    No full text
    Internal structure and organization of LTR-retrotransposons in plants for: (A) Ty1/copia superfamily and (B) Ty3/gypsy superfamily. Depending on the position of the integrase (INT) domain, the element can be classify to Ty1/copia or Ty3/gypsy superfamily.</p

    Neural network architecture of YORO.

    No full text
    Analysis of eukaryotic genomes requires the detection and classification of transposable elements (TEs), a crucial but complex and time-consuming task. To improve the performance of tools that accomplish these tasks, Machine Learning approaches (ML) that leverage computer resources, such as GPUs (Graphical Processing Unit) and multiple CPU (Central Processing Unit) cores, have been adopted. However, until now, the use of ML techniques has mostly been limited to classification of TEs. Herein, a detection-classification strategy (named YORO) based on convolutional neural networks is adapted from computer vision (YOLO) to genomics. This approach enables the detection of genomic objects through the prediction of the position, length, and classification in large DNA sequences such as fully sequenced genomes. As a proof of concept, the internal protein-coding domains of LTR-retrotransposons are used to train the proposed neural network. Precision, recall, accuracy, F1-score, execution times and time ratios, as well as several graphical representations were used as metrics to measure performance. These promising results open the door for a new generation of Deep Learning tools for genomics. YORO architecture is available at https://github.com/simonorozcoarias/YORO.</div

    Performance of YORO in detecting internal domains of LTR retrotransposons using the Genomic Object Detection approach.

    No full text
    (A) Loss function during model training. Parameters used: Adam algorithm, learning rate of 0.001, batch size of 128, number of epochs 100, no droputs, data split: training (80%), validation (10%), testing (10%). (B) Precision-Recall curve with TP (True Positive), TN (True Negative), FP (False Positive) and FN (False Negative) defined on a nucleotide basis. Only domain detection is considered, regardless of its classification. (C) Parity plot for the positions of the beginning of the domains. (D) Visualization of the domains in the 50,000 bp window (X-axis). The upper part corresponds to the predictions done by YORO. The lower part corresponds to the actual label. AP: Asparic Protease (black); GAG: Capside protein (red), ENV: Enveloppe (green), INT: Integrase, RT: Reverse Transcriptase (blue), RNAseH: Ribonuclease H (light blue).</p
    corecore